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The perception of a melody is invariant to the absolute properties 
of its constituting notes, but depends on the relation between them 
— the melody's relative pitch profile. In fact, a melody's "Gestalt" is 
recognized regardless of the instrument or key used to play it. Pitch 
processing in general is assumed to occur at the level of the audi- 
tory cortex. However, it is unknown whether early auditory regions 
are able to encode pitch sequences integrated over time (i.e., melo- 
dies) and whether the resulting representations are invariant to 
specific keys. Here, we presented participants different melodies 
composed of the same 4 harmonic pitches during functional mag- 
netic resonance imaging recordings. Additionally, we played the 
same melodies transposed in different keys and on different instru- 
ments. We found that melodies were invariantly represented by 
their blood oxygen level-dependent activation patterns in primary 
and secondary auditory cortices across instruments, and also 
across keys. Our findings extend common hierarchical models of 
auditory processing by showing that melodies are encoded indepen- 
dent of absolute pitch and based on their relative pitch profile as 
early as the primary auditory cortex. 
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the auditory cortex. We expected neural response differences 
to the 2 melodies to be reflected in patterns of blood oxygen 
level-dependent (BOLD) activation rather than in mean signal 
as both melodies were matched in terms of low-level acoustic 
properties. For stimulation, we used a modified version of the 
Westminster Chimes, as this melody is easy to recognize and 
subject to general knowledge. By dividing the entire pitch se- 
quence in half, we obtained 2 perceptually distinct melodies. 
Both were matched in rhythm and based on the exact same 4 
harmonic pitches, but differed in temporal order, that is, in 
melodic Gestalt (Figs. 1 and 2). Playing both melodies on the 
same instrument and in the same key allowed us to test 
whether BOLD patterns represented a melody's "Gestalt", that 
is, its relative pitch profile. In addition, to test whether relative 
pitch-encoding was invariant with regard to absolute pitch, 
we played both melodies in a different key, transposed by 6 
semitones. Data were analyzed using multivariate pattern 
classification, as this method can be used to determine stat- 
istics of activation differences but also of commonalities 
(Seymour et al. 2009), which would be needed to test 
whether activity patterns of both melodies generalized across 
different keys and instruments. 



Introduction 

Most sounds we experience evolve over time. In accord with 
"Gestalt" psychology, we perceive more than just the sum of 
the individual tones that make up a melody. A temporal 
re-arrangement of the same tones will give rise to a new 
melody, but a given melodic "Gestalt" will remain the same 
even when all pitches are transposed to a different key 
(Ehrenfels 1890). The stable melodic percept therefore 
appears to emerge from the relationship between successive 
single notes, not from their absolute values. This relationship 
is known as relative pitch and is assumed to be processed via 
global melodic contour and local interval distances (Peretz 
1990). Theoretical models of auditory object analysis suggest 
that the integration of single auditory events to higher-level 
entities may happen at the stage of the auditory cortex 
(Griffiths and Warren 2002, 2004). Moreover, it has been 
suggested that auditory object abstraction, that is, perceptual 
invariance with regard to physically varying input properties, 
may emerge at the level of the early auditory cortex 
(Rauschecker and Scott 2009)- However, so far there is no 
direct experimental evidence on encoding of abstract melodic 
information at early auditory processing stages. 

In the present study, we used functional magnetic reson- 
ance imaging (fMRI) and variations of 2 different melodies to 
investigate pitch-invariant encoding of melodic "Gestalt" in 



Materials and Methods 

Participants 

Eight volunteers (all non-musicians, 7 males, 1 female) aged between 
24 and 37 years with no history of hearing impairment participated in 
this study. All were given detailed instructions about the procedures 
and provided written informed consent prior to the experiments. The 
study was approved by the ethics committee of the University Hospi- 
tal Tubingen. 



Auditory Stimuli 

Stimuli were generated using Apple's Garageband and post-processed 
using Adobe Audition. We employed 2 melodies, both comprising the 
same 4 pitches (E4, C3, D4, and G3). Both melodies were played on 
piano and flute. Additionally, for the piano we also transposed both 
melodies by 6 semitones downwards resulting in 2 additional melo- 
dies comprising 4 different pitches (A#3, F*2, G#3, and C*3). Thus, 
altogether our experiment involved 6 melodic conditions. We chose a 
transposition distance of 6 semitones as this assured that chromae of 
all 4 pitches composing the transposed melodies were different from 
those played in the original key. Melodies were sampled at 44.1 kHz, 
matched in root-mean-square power and in duration (2 s, first 3 tones: 
312 ms, last tone: 937 ms; preceding silence period of 127 ms). Both 
auditory channels were combined and presented centrally via head- 
phones. Melodies were played using a tempo of 240 bpm. 

We performed a control experiment to rule out that any effect ob- 
served in the main experiment could be accounted for by the duration 
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Training Prior to Scanning 

To ensure that all participants were able to recognize both melodies 
regardless of key and instrument we conducted a simple 2-alternatives 
forced-choice melody discrimination task prior to scanning. Each par- 
ticipant listened to a randomly ordered sequence containing all 6 
melodic stimuli and pressed 1 of 2 buttons to classify the melody as 
melody I or II. Participants spent 10-25 min performing this task 
until they felt comfortable in recognizing the 2 melodies. 
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Figure 1. Melodies in piano-roll notation. Both stimuli are shown in the original key 
as well as in the transposed version, B semitones lower. Note that in the transposed 
condition all 4 pitches comprise different chromae as compared with the original key. 
A behavioral experiment prior to fMRI scanning assured that all participants were 
able to distinguish between both melodies. 
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Figure 2. Time-frequency spectrograms of melodic stimuli. As expected, after 
transposition to a different key systematic frequency shifts between original and 
transposed melodies exist. Moreover, different distributions of harmonic power 
between piano and flute reflect their difference in timbre. 

of the last tone that was longer than the preceding 3 in all melodic 
sequences (c.f. Fig. 1). In the control experiment the durations of all 
tones were matched, with the same duration as the first 3 (312 ms) of 
the initial experiment. 



fMRI Measurements 

For each participant 6 experimental runs containing 343 volumes 
were acquired, plus 1 run for a separate sound localizer comprising 
226 volumes. Functional data were recorded on a Siemens 3T TIM 
Trio scanner using a ^"-weighted gradient echo-planar imaging (EPI) 
sequence. Functional images were acquired using a low-impact-noise 
acquisition fMRI sequence, which increases the dynamic range of the 
BOLD signal in response to acoustic stimuli (Seifritz et al. 2006). 
In short, this sequence elicits a quasi-continuous acoustic gradient 
noise that induces less scanner-related BOLD activity compared with 
conventional EPI sequences (which induce increased levels of audi- 
tory baseline activity due to their pulsed scanner noise). Functional 
volumes were acquired using the following parameters: Gradient re- 
called echo-planar acquisition sequence with 18-image slices, 3-mm 
slice thickness, 2100 ms volume time of repetition, 20 x 20 cm field of 
view, 96x96 matrix size, 48 ms echo time, 80° flip angle, 1157 Hz 
bandwidth, resulting in-plane resolution 2.1x2.1mm 2 . Slices were 
positioned such that the temporal cortex including Heschl's gyrus 
(HG), superior temporal sulcus (STS), and superior temporal gyrus 
(STG) was fully covered. For each participant, a structural scan was 
also acquired with a Tj-weighted lxlxl mm 3 sequence. 

Experimental Paradigm 

During data acquisition, stimuli were delivered binaurally at a comfor- 
table volume level using MRI compatible headphones. For each par- 
ticipant, 6 runs of data were acquired, each comprising 36 stimulus 
blocks. Each stimulus block consisted of randomly either 5 or 6 iden- 
tical melodic stimuli (each melody lasted 2 s, followed by 500 ms 
silence). The order of stimulus blocks was pseudo randomized and 
counterbalanced such that each of the 6 stimulus conditions was 
equally often preceded by all stimulus conditions. Thus, each run 
comprised 6 repetitions of all 6 melodic conditions. Blocks were 
separated by silence periods of 5 s. Preceding each run, 4 dummy 
volumes were acquired, and 1 randomly selected additional melodic 
block was included to ensure a stable brain state after the onset of 
each run. Dummies and the initial dummy block were removed prior 
to analysis. The functional localizer consisted of 16 randomly selected 
melody blocks of the main experiment, each separated by silence 
periods of 12.5 s. Participants were instructed to report how often a 
given melody was played within a stimulation block by pressing 1 of 
2 buttons of a MR compatible button box during the silence periods 
after each block (in main experiment and localizer). In the main 
experiment as well as in the functional localizer stimulation blocks 
were presented in a jittered fashion. 

fMRI Preprocessing and Univariate Analysis 

All neuroimaging data was preprocessed using SPM5 (http://www.fil. 
ion.ucl.ac.uk/spm/). Functional images were corrected for slice acqui- 
sition time, realigned to the first image using an affine transformation 
to correct for small head movements and EPI distortions were un- 
warped, and spatially smoothed using an isotropic kernel of 3-mm 
full width at half-maximum. Preprocessed images of each run were 
scaled globally, high pass filtered with a cutoff of 128 s, and con- 
volved with the hemodynamic response function before entering a 
general linear model with one regressor of interest for each stimulus 
block. Additionally, regressors for SPM realignment parameters and 
the mean signal amplitude of each volume obtained prior to global 
scaling were added to the model. For the sound localizer a general 
linear model was fitted involving one regressor for the melody blocks 
and one regressor for the silence periods. Used as a univariate feature 
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selection for further multivariate analysis, the f-contrast "sound versus 
silence" allowed us to independently rank voxels according to their 
response sensitivity to melodic stimuli [see Recursive Feature Elimin- 
ation (RFE) methods]. 



Multivariate Pattern Analysis 

Preprocessed functional data were further analyzed using custom soft- 
ware based on the MATLAB version of the Princeton MVPA toolbox 
(http://code.google.eom/p/princeton-mvpa-toolbox/). Regressor beta- 
values (one per stimulus block) from each run were jr-score normal- 
ized and outliers exceeding a value of 2 standard deviations were set 
back to that value. Data were then used for multivariate pattern analy- 
sis employing a method that combines machine learning with an 
iterative, multivariate voxel selection algorithm. This method was re- 
cently introduced as "RFE" (De Martino et al. 2008) and allows the 
estimation of maximally discriminative response patterns without an a 
priori definition of regions of interest. Starting from a given set of 
voxels a training algorithm (in our case a support vector machine 
algorithm with a linear kernel as implemented by LIBSVM (http:// 
www.csie.ntu.edu.tw/~cjlin/libsvm/) discards iteratively irrelevant 
voxels to reveal the informative spatial patterns. The procedure per- 
forms voxel selection on the training set only, yet increases classifi- 
cation performance of the test data. The method has proved to be 
particularly useful in processing data of the auditory system (Formisano 
et al. 2008; Staeren et al. 2009). Our implementation was the following: 
In a first step, beta estimates of each run (one beta estimate per stimu- 
lus block/trial) were labeled according to their melodic condition (e.g., 
melody I played by flute). Within each participant and across all runs, 
this yielded a total of 36 trials for each melodic condition. Sub- 
sequently, BOLD activation patterns were analyzed using the LIBSVM- 
based RFE. For each pair of melodic conditions (e.g., both melodies 
played by flute), trials were divided into a training set (30 trials per 
condition) and a test set (6 trials per condition), with training and test 
sets originating from different fMRI runs. The training set was used for 
estimating the maximally discriminative patterns with the iterative 
algorithm; the test set was only used to assess classification perform- 
ance of unknown trials (i.e., not used in the training). 

All analyses started from the intersection of voxels defined by an 
anatomically delineated mask involving both hemispheres (including 
temporal pole, STG, STS, and insula, Supplementary Material) and 
sound responsive voxels identified in the separate localizer exper- 
iment (f-contrast sound vs. silence). Voxels falling into the anatomical 
mask were ranked according to their f-values and the most responsive 
2000 were selected. Compared with recent studies employing RFE in 
combination with auditory fMRI data, a starting set of 2000 voxels is 
still the upper limit for this type of analysis (De Martino et al. 2008; 
Formisano et al. 2008; Staeren et al. 2009). Due to noise, in large 
voxel sets the RFE method is prone to incorrectly labeling potentially 
informative voxels with too low weights, thus discarding them early 
in the iteration cycle, which ultimately leads to suboptimal classifi- 
cation of the test-set. To minimize this problem, we needed to further 
preselect the voxel set prior to RFE, which we wanted to do on the 
basis of classifier performance (within the training set) rather than on 
f-values alone. To this end, we stepwise carried out cross-validated 
(leave-one-run-out) classifications and removed each time the 4% 
voxels with the lowest f-values obtained in the sound-localizer. 
Similar to the RFE approach, we subsequently selected the voxel set 
with the peak classification accuracy for further analysis. In contrast 
to RFE; however, we still used each voxel's sound-responsiveness (as- 
sessed by the functional sound localizer) as its discard criterion and 
thus avoided the rejection of potentially informative data by noise- 
driven low SVM weights. This initial feature selection procedure did 
not involve any testing data used for the subsequent RFE but exclu- 
sively independent data. We set the stop criterion of the initial feature 
selection method at 1000 voxels to get a lower voxel limit for the RFE 
analysis as employed previously (Staeren et al. 2009). The final voxel 
population on which the RFE analysis started ranged therefore from 
2000 to 1000 voxels. We employed 6 cross-validation cycles, each in- 
volving different runs for training and testing. For each of these cross- 
validations, 10 RFE steps were carried out, each discarding 40% of 



voxels. Crucially, classification performance of the current set of 
voxels was assessed using the external test set. The reported correct- 
ness for each binary comparison was computed as an average across 
the 6 cross-validations. Single-participant discriminative maps corre- 
sponded to the voxel-selection level that gave the highest average cor- 
rectness. These maps were sampled on the reconstructed cortex of 
each individual participant and binarized. 

To compare classification performances between left and right 
hemispheres, we additionally ran the same RFE analysis on each 
hemisphere separately. To match the number of starting voxels with 
those of the analysis involving both hemispheres, we separately 
defined the anatomical region of interest (ROI) for each hemisphere 
and also ranked voxels separately for a given hemisphere according 
to its statistical f-values in the sound localizer map. Thus, also within 
each hemisphere, the final voxel population on which further analysis 
started ranged from 2000 to 1000 voxels. 

Discriminative Group Maps 

To examine the spatial consistency of the discriminative patterns 
across participants, group-level discriminative maps were generated 
after cortex-based alignment of single-participant discriminative maps 
(Fischl et al. 1999). For a given experiment, the binary single- 
participant maps were summed up, and the result was thresholded 
such that only vertices present in the individual discriminative maps 
of at least 5 of the 8 participants survived. A heat-scale indicates con- 
sistency of the voxel patterns distinguishing a given experimental con- 
dition pair in that the highest values correspond to vertices selected 
by all participants. As each individual discriminative map only con- 
tributes voxels that survived the recursive feature elimination, this 
group map can be interpreted as a spatial consistency measure across 
participants. 

Tests for Lateralization Biases 

To detect possible lateralization biases we tested classification per- 
formances from left and right hemispheres against each other on the 
group level using a one sample f-test. Additionally, we tested for a 
possible hemispheric bias during selection of discriminative voxels by 
the RFE analysis that involved data of both hemispheres. To this end, 
we compared the numbers of discriminative voxels within each hemi- 
sphere across the group. For each participant and classification, we 
calculated the lateralization index as the difference between the 
number of voxels of left and right hemisphere divided by the number 
of voxels selected in both hemispheres. A lateralization index of 1 or 
— 1 thus means that all voxels selected by the RFE fell into 1 hemi- 
sphere. An index of 0 indicates that there was no lateralization at all. 
We then tested for systematic lateralization biases across the group. 
For generalization across keys and instruments voxel counts of both 
classification cycles (see RFE methods) were combined. 



Results 

De-coding Melodies From Voxel Patterns 

Initially, we examined whether both melodies, played in the 
same key and on the same instrument could be distinguished 
by their corresponding BOLD patterns. We trained the classi- 
fier separately on piano or flute trials, respectively, and 
applied a leave-one-out cross-validation approach across runs 
to test the classifier on independent runs (see Materials and 
Methods). Within both instruments melodies were classified 
significantly above chance (piano: 0.65, P = 1.22 x 10~ 05 ; 
and flute: 0.63 P= 4.08 x 10" 05 ; 1-tailed Mest, n = S), indicat- 
ing that the BOLD patterns were melody-specific for these 
stimuli (see Fig. 3A). To illustrate the anatomical consistency 
of discriminative voxel populations across participants we 
generated a group-level map showing only voxels that 
coincided anatomically in at least 5 out of 8 participants. 
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Figure 3. De-coding results based on fMRI voxel patterns of early auditory cortex. 
(4) Performance of classification between melodies, tested separately on 2 instruments. 
(6) Classification performance between melodies played on different instruments and 
different keys. All results were confirmed using a sub-sample of 5 participants and 
adapted versions of our melodic stimuli where all pitches were of matched duration 
(Supplementary Table S2). *P < 0.05, Bonferroni corrected for 4 comparisons. 
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Figure 4. Multiparticipant consistency maps of discriminative voxels shown on a 
standard brain surface of the temporal lobes. The yellow outline illustrates the 
anatomically defined region of interest from which all analyses started from. All group 
maps were created by summation of the individual discriminative voxel maps of all 8 
participants. Each map was thresholded such that a voxel had to be selected in at 
least 5 individuals to appear on the group map. Color coding indicates consistency 
across participants. Across all classifications no lateralization bias was found (c.f. 
Supplementary Table S1). For group map of melody classification on piano see 
Supplemental Figure S1 . 



Figure and Supplementary Figure SI show that the most 
consistent discriminative voxel patterns span bilaterally from 
lateral HG into the Planum Temporale (PT). 

Invariance to Instrument and Key 

To examine the influence of timbre on melody classification, 
in a next step, we trained the classifier to distinguish both me- 
lodies played on one instrument and tested it using the melo- 
dies played on the other instrument. To assure that each 
instrument was once used for training and once for testing, 
classification was conducted twice and accuracies of both 
turns were averaged. Figure 3B shows classification results 
across instruments. Despite substantial differences in energy 
distribution and frequency spectra between both instruments, 
this classification also succeeded significantly above chance 
(0.58, P=6.28 x 10" 05 ; 1-tailed f-test, n = 8). Again, the inspec- 
tion of the corresponding discriminative group maps revealed 
a distribution of discriminative voxels that is consistent 
with our previous results, spanning bilaterally in HG and PT 
(Fig. 4B). 

Subsequently, we examined whether BOLD activation pat- 
terns preserved melody-specific information across different 
keys. Note that after transposition, the only common property 
that characterized the 2 melodies as identical was their rela- 
tive change in pitch height evolving over time, that is, its 
melodic "Gestalt". We trained the classifier on both melodies 
played in one key and tested it on the same melodies trans- 
posed by 6 semitones to a different key (see Materials 
Methods). To use both keys once for testing and once for 
training we again conducted this classification twice and aver- 
aged both performances. Figure 3B shows that classification 
across keys (0.58, P= 2.60 x 10" 04 ; 1-tailed f-test, w = 8) suc- 
ceeded significantly above chance. This implies that BOLD 
patterns do not only represent differences in absolute pitch 
but that they also code for relative pitch, that is, information 
that is necessary for the concept of melodic "Gestalt". The 
inspection of the corresponding discriminative group maps 
revealed a distribution of discriminative voxels spanning bilat- 
erally in HG and PT (Fig. 4C). 

Tests for Lateralization Effects Between Left and Right 
Hemispheres 

To examine potential lateralization effects in coding of 
melodic "Gestalt", we compared the classification per- 
formances obtained during separate analysis of left and 
right hemispheric ROIs. This however did not reveal any 
systematic differences between hemispheres [2-tailed f-test; 
Melody Classification f( 15) = 0.52, P=0.6l; Instruments 
f(7) = 0.04, P= 0.97; Keys f(7) = 0.81, P= 0.45; c.f. Supplemen- 
tary Figure S2]. Moreover, we tested for a potential selection 
bias during RFE analysis on the joint ROI with voxels of both 
hemispheres. However, this analysis also did not reveal any 
systematic preference towards either hemisphere; that is, 
voxels of both hemispheres were equally likely to be selected 
during RFE (2-tailed f-test; Melody Classification f(15) = -0.09, 
P=0.93; Instruments f(7) = 0.89, ,P=0.40; Keys f(7) = -0.89, 
P = 0.40; c.f. Supplementary Table SI). 

In a last step, using a subset of 5 of our 8 participants, we 
examined whether the duration of the last tone (which was 
longer than the remaining 3) could have affected our results. 
Note that this would have been relevant only for 
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distinguishing non-transposed melodies, as for the transposed 
ones 

low-level pitch information could not have provided discrimi- 
native cues. In any case, we were able to replicate all results 
with similar de-coding performances using adapted melodies 
with matched duration of all pitches (Supplementary 
Table S2). 

Localizing Human Primary Auditory Cortex 

To provide an objective measure for the extent of overlap 
between discriminative voxels and the anatomical location of 
primary auditory cortex (PAC), we related our results to the 
histologically defined areas Tel.O, Tel.l, and Tel. 2 (Morosan 
et al. 2001). Figure 5A shows all 3 of these areas (at a prob- 
ability threshold of 30%) on the standard surface used for all 
group analyses in this study. There was a close overlap 
between all cytoarchitectonically defined primary regions and 
the anatomical landmarks of HG. To directly compare the 3 
anatomically defined core regions with our results, we show 
their common outline on top of an average of all group maps 
obtained by all classifications of the 2 melody conditions 
(Fig. 5B). Even though substantial parts of this map extend to 
PT, there is a high degree of agreement between the histologi- 
cally defined PAC and the average discriminative voxel-maps. 

Discussion 

We examined neural representations of melodic sequences in 
the human auditory cortex. Our results show that melodies 
can be distinguished by their BOLD signal patterns as early as 
in HG and PT. As our melodies differed only in the sequence, 
but not identity of pitches, these findings indicate that the 
temporal order of the pitches drove discriminative pattern 
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Figure 5. (/!) Spatial overlap between the cytoarchitectonically defined PAC (areas 
Te 1.0, 1.1, and 1.2, probabilistic threshold: 30%) and an average group map of 
discriminative voxels for all melody classifications (S). Substantial overlap exists 
between our results and areas Te. 



formation. Importantly, our results show that the voxel pat- 
terns were diagnostic for melodies also when they were 
played on different instruments, and even when they were 
transposed by 6 semitones into a different key. Our findings 
therefore suggest that melodic information is represented as 
relative pitch contour, invariant to low-level pitch or timbre 
information, in early auditory cortex. 

By definition, a melody consists of several pitches, inte- 
grated across time. Previous evidence points towards a role of 
a region anterior to HG in the processing of pitch changes. 
Activity in this area was found to correlate with the amount 
of frequency change over time (Zatorre and Belin 2001). 
Equally, univariate contrasts between melodic stimuli (simply 
defined as variations of pitch over time) and frequency 
matched noise, fixed pitch (Patterson et al. 2002) or silence 
(Brown et al. 2004) did activate this area. However, univariate 
contrasts between different melodic excerpts (i.e., random vs. 
diatonic melodies), did not lead to differential activation there 
or in any other brain area (Patterson et al. 2002). Thus, the 
role of regions anterior to HG in differential melody encoding 
remains elusive. Since some voxels anterior to HG were also 
active in our sound localizer contrast "sound versus silence", 
they were included in the RFE analysis (c.f. representative par- 
ticipant in Supplementary Figure S3). However, voxels of this 
region turned out to be non-discriminative during melody 
classification. Even though this null finding does not necess- 
arily imply a general lack of melody-specific information in 
this area (Bartels et al. 2008), we found that among all audi- 
tory regions examined, only BOLD patterns of PAC and PT 
held sufficient information to distinguish the 2 highly con- 
trolled melodies that consisted of permutations of 4 identical 
pitches. When played on the same instrument the only feature 
that differed between our melodies was their relative pitch 
profile, that is, their melodic "Gestalt". Importantly, melodic 
information in HG and PT was not bound to a specific key 
but, like perceptual melodic "Gestalt", also generalized across 
different keys. Relative pitch information in HG and PT thus 
appears to be independent of absolute frequency. 

Another important question concerns hemispheric asym- 
metry. Our results did not show any hemispheric bias in 
melodic processing. Neither were we able to detect systematic 
differences in classification performances between left and 
right hemispheres (Supplementary Figure S2) nor did the RFE 
analysis preferably recruit voxels from one hemisphere when 
employed on a joint ROI including voxels of both hemi- 
spheres (Supplementary Table SI). The lack of lateralization 
found here stands in contrast to previous evidence that 
points towards a hemispheric specialization regarding specific 
aspects of melody processing. Patient studies suggest a differ- 
ential specialization for processing of melodic contour and in- 
terval distance in right and left hemispheres, respectively 
(Peretz 1990). This principle of lateralization was also found 
by a recent fMRI study, however with reversed roles for left 
and right hemispheres (Stewart et al. 2008). Additionally, la- 
teralization effects regarding differences in spectral resolution 
in the pitch domain were reported (Hyde et al. 2008). 

We should point out that our evidence is nevertheless in no 
conflict with that of prior literature, firstly as our stimuli 
differ, and secondly as we used an entirely distinct approach 
to analyze data. In the present study, the 2 melodies varied 
both in their melodic contours and in their interval profiles. 
Hemispheric specialization regarding these 2 dimensions 
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would therefore predict the involvement of both auditory cor- 
tices (Peretz 1990). The same idea applies to spectral laterali- 
zation. The smallest interval distance used in the present 
study was 2 semitones. Activation biases toward the right 
auditory cortex were however only reported when stimuli in- 
volved smaller pitch distances and both sides responded to a 
similar extent when stimuli involved an interval distance of 2 
semitones (Hyde et al. 2008). The general lack of lateraliza- 
tion biases in pitch processing is moreover compatible with 
recent findings of another group (Hall and Plack 2009) but 
contradicts former studies reporting musical pitch processing 
being lateralized to the right PT (Patterson et al. 2002; Zatorre 
et al. 2002). In the context of voxel pattern classification, it 
should also be kept in mind that callosal connections can 
transfer information from one hemisphere to the other. In the 
visual cortex, for example, it has been shown that voxel pat- 
terns of the un-stimulated hemisphere can encode the at- 
tended motion-direction of the stimulated hemifield (Serences 
and Boynton 2007). 

Pitch processing is generally associated with lateral HG. For 
example, based on iterated rippled noise (IRN) stimuli several 
studies have shown increased activation in this region associ- 
ated with pitch (Griffiths et al. 1998; Patterson et al. 2002; Hall 
et al. 2005). However, recent evidence suggests that not pitch 
saliency per se but spectro-temporal modulations correlated 
with pitch saliency may be the feature of IRN stimuli that acti- 
vates lateral HG (Hall and Plack 2009; Barker et al. 2012). It is 
moreover still unclear whether binaural pitch or pure tones in 
noise do activate lateral HG or not (Hall et al. 2005; Hall and 
Plack 2009). Thus, the role of lateral HG in pitch processing is 
at present still under debate. As we aimed to examine the pro- 
cessing of melodic patterns rather than that of pitch per se, in 
the present study we exclusively composed our melodies using 
harmonic pitches. Additionally, to examine melodic "Gestalt" 
independently of absolute pitch, we composed both melodies 
based on permutations of identical pitches and generalized 
relative pitch information across different keys. Our results are 
thus not directly comparable with studies of pitch processing 
that typically contrast a given pitch stimulus with control 
stimuli that do not generate a pitch percept, or different pitch 
eliciting stimuli associated with varying pitch salience. Yet our 
findings are generally compatible with evidence for pitch rep- 
resentations in early auditory cortex including lateral HG as all 
discriminative voxel maps of melodic representations (i.e., a 
variation of pitch over time) involved patterns spanning from 
HG to PT (Fig. 4). 

The representation of melodic "Gestalt" in early auditory 
cortex is compatible with patient studies pointing towards a 
role for HG in processing of musical intervals (Stewart et al. 
2006). For example, one study reported deficits in sensitivity to 
relative pitch in a patient with a unilateral lesion of HG, while 
another patient with intact HG but lesions in rostral auditory 
association cortex did not show these deficits (Peretz et al. 
1994). Nevertheless, there is also good clinical evidence that 
regions beyond HG play a role in melody processing 
(Liegeois-Chauvel et al. 1998; Stewart et al. 2006). Deficits in 
melody perception and the analysis of musical intervals are thus 
not only associated with lesions posterior to HG and PT but 
also with those of the parieto-temporal junction or lesions in 
STG, lying anterior to HG (Stewart et al. 2006). Moreover, a 
recent neuroimaging study suggested a role for intra-parietal 
sulcus in active melody transposition (Foster and Zatorre 2010). 



As the present study was exclusively focused on early audi- 
tory areas we only scanned the temporal lobe and examined 
sound-responsive voxels therein (see methods). Thus, to fully 
understand the cortical network underpinning the processing 
of relative pitch, further experiments are desirable. In particu- 
lar, future data might also help clarify whether the melodic 
information we observed in BOLD patterns of HG originated 
in HG was conveyed to it by sensory ascending activity, or 
whether it was fed back to HG from higher-level brain areas. 
In the visual cortex, for example, it is known that the primary 
visual cortex is modulated by high-level object or motion 
information and that its patterns can encode high-level object 
information (Williams et al. 2008). 

Nevertheless, our findings challenge a recent study that 
attributes the extraction of melodic contour exclusively to 
higher areas (Lee et al. 2011). In fact, a major part of auditory 
feature selection is assumed to be completed at subcortical 
stages already (Nelken 2004). As PAC lies between inferior 
colliculus and secondary auditory areas, it has been proposed 
to be a likely locus where representations of physical low- 
level sound properties may be transformed to behaviorally rel- 
evant representations (Nelken 2008). Growing evidence from 
animal models suggests that the responses in PAC depend on 
rather long time windows of integration that span from 
seconds to tens of seconds, a time frame that appears too 
long for processing of simple acoustic features (Chait et al. 
2007). According to that, in the human early auditory cortex 
frequency specific short-term memory has been demonstrated 
(Linke et al. 2011). These findings provide a crucial prerequi- 
site for melodic representations in the auditory cortex, as any 
mechanism for relative pitch extraction, by nature, has to rely 
on the temporal integration of at least 2 pitches. 

Taken together, even though potentially influenced by 
input from higher-level areas, our results extend common 
hierarchical models of pitch sequence processing that attri- 
bute the extraction of relative pitch exclusively to higher- 
levels areas along the auditory neuraxis (Zatorre et al. 1994; 
Schiavetto et al. 1999; Patterson et al. 2002). We do not know 
whether relative pitch information in the early auditory cortex 
is of implicit quality, that is, read out by higher cognitive 
areas to give rise to the percept of melodic "Gestalt", similar 
to the way that periodicity information in the cochlear 
nucleus is assumed to be read out by higher areas to give rise 
to the percept of pitch. Nevertheless, the present study pro- 
vided for the first time direct evidence that relative pitch infor- 
mation, corresponding to the concept of melodic "Gestalt", is 
represented at the level of the early auditory cortex. 
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